5 research outputs found

    EVALUATION OF INTELLIGIBILITY AND SPEAKER SIMILARITY OF VOICE TRANSFORMATION

    Get PDF
    Voice transformation refers to a class of techniques that modify the voice characteristics either to conceal the identity or to mimic the voice characteristics of another speaker. Its applications include automatic dialogue replacement and voice generation for people with voice disorders. The diversity in applications makes evaluation of voice transformation a challenging task. The objective of this research is to propose a framework to evaluate intentional voice transformation techniques. Our proposed framework is based on two fundamental qualities: intelligibility and speaker similarity. Intelligibility refers to the clarity of the speech content after voice transformation and speaker similarity measures how well the modified output disguises the source speaker. We measure intelligibility with word error rates and speaker similarity with likelihood of identifying the correct speaker. The novelty of our approach is, we consider whether similarly transformed training data are available to the recognizer. We have demonstrated that this factor plays a significant role in intelligibility and speaker similarity for both human testers and automated recognizers. We thoroughly test two classes of voice transformation techniques: pitch distortion and voice conversion, using our proposed framework. We apply our results for patients with voice hypertension using video self-modeling and preliminary results are presented

    Automatic Content Generation for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him or herself. Its effectiveness in rehabilitation and education has been repeatedly demonstrated but technical challenges remain in creating video contents that depict previously unseen behaviors. In this paper, we propose a novel system that re-renders new talking-head sequences suitable to be used for VSM treatment of patients with voice disorder. After the raw footage is captured, a new speech track is either synthesized using text-to-speech or selected based on voice similarity from a database of clean speeches. Voice conversion is then applied to match the new speech to the original voice. Time markers extracted from the original and new speech track are used to re-sample the video track for lip synchronization. We use an adaptive re-sampling strategy to minimize motion jitter, and apply bilinear and optical-flow based interpolation to ensure the image quality. Both objective measurements and subjective evaluations demonstrate the effectiveness of the proposed techniques

    Automatic Video Self Modeling for Voice Disorder

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient’s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Results of both objective measurements and subjective evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques

    A scoring system derived from electronic health records to identify patients at high risk for noninvasive ventilation failure

    No full text
    Objective: To develop and validate a clinical risk prediction score for noninvasive ventilation (NIV) failure defined as intubation after a trial of NIV in non-surgical patients. Design: Retrospective cohort study of a multihospital electronic health record database. Patients: Non-surgical adult patients receiving NIV as the first method of ventilation within two days of hospitalization. Measurement: Primary outcome was intubation after a trial of NIV. We used a non-random split of the cohort based on year of admission for model development and validation. We included subjects admitted in years 2010-2014 to develop a risk prediction model and built a parsimonious risk scoring model using multivariable logistic regression. We validated the model in the cohort of subjects hospitalized in 2015 and 2016. Main results: Of all the 47,749 patients started on NIV, 11.7% were intubated. Compared with NIV success, those who were intubated had worse mortality (25.2% vs. 8.9%). Strongest independent predictors for intubation were organ failure, principal diagnosis group (substance abuse/psychosis, neurological conditions, pneumonia, and sepsis), use of invasive ventilation in the prior year, low body mass index, and tachypnea. The c-statistic was 0.81, 0.80 and 0.81 respectively, in the derivation, validation and full cohorts. We constructed three risk categories of the scoring system built on the full cohort; the median and interquartile range of risk of intubation was: 2.3% [1.9%-2.8%] for low risk group; 9.3% [6.3%-13.5%] for intermediate risk category; and 35.7% [31.0%-45.8%] for high risk category. Conclusions: In patients started on NIV, we found that in addition to factors known to be associated with intubation, neurological, substance abuse, or psychiatric diagnoses were highly predictive for intubation. The prognostic score that we have developed may provide quantitative guidance for decision-making in patients who are started on NIV. Keywords: Acute respiratory failure; Intubation; Mechanical ventilation; Predictive score; noninvasive ventilation failure
    corecore